First things first, load in all the packages you’ll need throughout this process.

packages_needed <- c("ggplot2", # graphics
                     "arm", # display() etc.
                     "ggfortify",
                     "rmdformats")
pk_to_install <- packages_needed [!( packages_needed %in% rownames(installed.packages())  )]
if(length(pk_to_install)>0 ){
  install.packages(pk_to_install,repos="http://cran.r-project.org")
}


library(ggplot2)
library(arm)
## Loading required package: MASS
## Loading required package: Matrix
## Loading required package: lme4
## 
## arm (Version 1.11-2, built: 2020-7-27)
## Working directory is D:/Documents/Accademics/APSU/AdvancedData_Class/2021_09-14_AdvancedData_GLM_Assignment
library(ggfortify)
library(rmdformats)

Pull in data for both analyses. I am using two of my own datasets, one which contains standardized measurments for Heloderma across their range, and another with records from Vernal Pool monitering across the northeastern part of the US.

Heloderma_Records <- read.csv("data/Heloderma_Records.csv")
Vernal_Pool_Database <- read.csv("data/Mass_Pool_Table.csv")

Gila Analysis

Start by subsetting all our information down to something we can work with. Gila monsters are notoriously hard to sex based exclusively on external characteristics, although there have been some trends observed. There have been some researchers who have reported that Male gilas tend to have broader heads than females. While this is always discribed as an imperfect means of determining sex, we can test if there is any truth to this pattern.

We subset out only monsters who have been positively sexed as “Male” or “Female” in our database, and then additionally only those monsters who have had “Head Width” measured. Then, we make a new collumn in our data “Bi_Sex” which turns our “Male” and “Female” values into 1 and 0 values so that we can apply a binary distribution to it. I also removed all juviniles and sub adults based off size thresholds set by Beck (2005) as a means of simplifying the dataset (this method is said to work much better on Adults)

Gila_Data <- Heloderma_Records[Heloderma_Records$Species == "suspectum",]
Gila_Data <- Gila_Data[is.na(Gila_Data$HW) == FALSE,]
Gila_Data <- subset(Gila_Data, Sex == "M" | Sex == "F")
Gila_Data <- Gila_Data[Gila_Data$HW != 0,]
Gila_Data <- Gila_Data[Gila_Data$SVL > 220,]

for(i in 1:dim(Gila_Data)[1]){
  Gila_Data$Bi_Sex[i] <- ifelse(Gila_Data$Sex[i] == "F", 0, 1)}

Heres a photo of us measuring “Head Width” with calipers

Below, we’ll define our model and plot our data using a binomial distribution, since this data consists of 0s and 1s. By default we use a logit link.

Gila_Sex_v_HW_BinomialModel <- glm(Bi_Sex~HW, data=Gila_Data, binomial(link="logit"))

ggplot(Gila_Data,aes(HW,Bi_Sex)) +
  geom_point() + 
  geom_smooth(method="glm", method.args=list(family="binomial"(link="logit"))) +
  geom_hline(yintercept = mean(Gila_Data$Bi_Sex), linetype = "dashed") +
  ylab("Likelyhood of Monster Being Male (1) or Female (0)") +
  xlab("Head Width (mm)") +
  labs(title="Binomial Sex v. Head Width Graph")
## `geom_smooth()` using formula 'y ~ x'

display(Gila_Sex_v_HW_BinomialModel)
## glm(formula = Bi_Sex ~ HW, family = binomial(link = "logit"), 
##     data = Gila_Data)
##             coef.est coef.se
## (Intercept) -7.23     1.29  
## HW           0.17     0.03  
## ---
##   n = 290, k = 2
##   residual deviance = 360.8, null deviance = 400.4 (difference = 39.5)

The resulting graph does in fact show a logistic relationship between Sex and head width, where as we increase head size, the likelyhood of a monster being male also increases. The average likelyhood for any given monster with a head width of 50mm being a male is about 75% based on this analysis.

x <- predict(Gila_Sex_v_HW_BinomialModel)
y <- resid(Gila_Sex_v_HW_BinomialModel)
binnedplot(x, y)

We can plot our residuals too.

coef(Gila_Sex_v_HW_BinomialModel)
## (Intercept)          HW 
##  -7.2330379   0.1670862
confint(Gila_Sex_v_HW_BinomialModel)
## Waiting for profiling to be done...
##                 2.5 %     97.5 %
## (Intercept) -9.856486 -4.7863914
## HW           0.111852  0.2264593
summary(Gila_Sex_v_HW_BinomialModel)
## 
## Call:
## glm(formula = Bi_Sex ~ HW, family = binomial(link = "logit"), 
##     data = Gila_Data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7391  -1.0857   0.5638   1.0070   1.8603  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -7.23304    1.29012  -5.606 2.06e-08 ***
## HW           0.16709    0.02916   5.729 1.01e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 400.35  on 289  degrees of freedom
## Residual deviance: 360.85  on 288  degrees of freedom
## AIC: 364.85
## 
## Number of Fisher Scoring iterations: 4
0.1670862/4
## [1] 0.04177155

If we take a look at the summeries for our binomial glm, we can observe a handful of things. First, we see that our 95% CRI do not contain zero, meaning that we can be confident that there is an effect from these variables.

If we do some rough math using Gelman and Hill’s method, we can make an estimate that on average, an increase of 1mm in head width increases an individuals likelyhood of being a male by just above 4%.

Vernal Pool Analysis

Next lets take a look at some vernal pool data. We have data from over 400 vernal pools across the eastern United States, which have been monitered over the course of 15 years. Data collected includes pool attributes (like size and depth) and obligate pool breeding amphibian abundances (Chiefly Spotted Salamanders and Wood Frogs). Davis et al. 2019 used pool depth as a stand in for pool suitability for pool breeding amphibians, here I want to run a simple analysis to show if pool depth does in fact have a significant impact on the abundance of pool-breeding amphibians at a given pool.

First, lets visualize our data so that we can set a distribution.

Vernal_Pool_Database_nonNA <- subset(Vernal_Pool_Database, is.na(Max_RSYL) == FALSE)

hist(Vernal_Pool_Database_nonNA$Max_AMAC, breaks = 50, col = "lightgoldenrod1", xlab = "A. maculatum Abundance", main = "Spotted Salamander Abundance Histogram")

hist(Vernal_Pool_Database_nonNA$Max_RSYL, breaks = 50, col = "saddlebrown", xlab = "L. sylvaticus Abundance", main = "Wood Frog Abundance Histogram")

hist(Vernal_Pool_Database_nonNA$Max_Depth, breaks = 50, col = "turquoise4", xlab = "Max Pool Depth", main = "Pool Depth Histogram")

It looks like for all our data, particularly our amphibian counts, we can apply a poission distribution. This makes sense, as its count data, and the vast majority of counts will be on the lower end of the scale. Next, lets run a simple glm analysis using a poission distribution for our two species of interest.

Spotted Salamanders

ggplot(Vernal_Pool_Database_nonNA, aes(x = Max_Depth, y = Max_AMAC)) +
  geom_point(shape = 21, color = "black", fill = "lightgoldenrod1", alpha = .8) +
  geom_smooth(method="glm", method.args=list(family="poisson"(link="log")), color = "black") +
  xlab("Pool Depth (cm)") +
  ylab("Obverved A. maculatum Occupying Pool") +
  labs(title="Spotted Salamander Abundance at Vernal Pool Based on Pool Depth")
## `geom_smooth()` using formula 'y ~ x'

AMAC.glm <- glm(Max_AMAC ~ Max_Depth, data = Vernal_Pool_Database_nonNA, family = poisson)

anova(AMAC.glm)
## Analysis of Deviance Table
## 
## Model: poisson, link: log
## 
## Response: Max_AMAC
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev
## NULL                       4079     329106
## Max_Depth  1    65011      4078     264095
summary(AMAC.glm)
## 
## Call:
## glm(formula = Max_AMAC ~ Max_Depth, family = poisson, data = Vernal_Pool_Database_nonNA)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -35.089   -5.275   -4.378   -1.824   79.869  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 1.959e+00  5.877e-03   333.4   <2e-16 ***
## Max_Depth   1.981e-02  6.452e-05   307.1   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 329106  on 4079  degrees of freedom
## Residual deviance: 264095  on 4078  degrees of freedom
##   (51 observations deleted due to missingness)
## AIC: 272637
## 
## Number of Fisher Scoring iterations: 7
exp(1.959+.01981*100)
## [1] 51.4186

We see that pool depth does in face have a significant impact on Spotted Salamander abundances. We can use the slope and intercept estimates to make predictions on the expected number of AMAC at a pool of any given depth. At a pool of 1 meter in depth, we could expect to find on average 51 AMAC.

Wood Frogs

ggplot(Vernal_Pool_Database_nonNA, aes(x = Max_Depth, y = Max_RSYL)) +
  geom_point(shape = 24, color = "Black", fill = "saddlebrown", alpha = .8) +
  geom_smooth(method="glm", method.args=list(family="poisson"(link="log")), color = "black") +
  xlab("Pool Depth (cm)") +
  ylab("Obverved L. sylvaticus Occupying Pool") +
  labs(title="Wood Frog Abundance at Vernal Pool Based on Pool Depth")
## `geom_smooth()` using formula 'y ~ x'

RSYL.glm <- glm(Max_RSYL ~ Max_Depth, data = Vernal_Pool_Database_nonNA, family = poisson)

anova(RSYL.glm)
## Analysis of Deviance Table
## 
## Model: poisson, link: log
## 
## Response: Max_RSYL
## 
## Terms added sequentially (first to last)
## 
## 
##           Df Deviance Resid. Df Resid. Dev
## NULL                       4079     329674
## Max_Depth  1    36950      4078     292724
summary(RSYL.glm)
## 
## Call:
## glm(formula = Max_RSYL ~ Max_Depth, family = poisson, data = Vernal_Pool_Database_nonNA)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -27.537   -5.741   -4.888   -2.134   84.276  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 2.239e+00  5.763e-03   388.5   <2e-16 ***
## Max_Depth   1.608e-02  7.186e-05   223.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 329674  on 4079  degrees of freedom
## Residual deviance: 292724  on 4078  degrees of freedom
##   (51 observations deleted due to missingness)
## AIC: 301069
## 
## Number of Fisher Scoring iterations: 7
exp(2.239+.01608*100)
## [1] 46.8523
We can do this same analysis for Wood Frogs. We see that this trend is similar for wood frogs, however the impact of pool depth seems to be slightly less for RSYL. At a pool of 1 meter in depth, we could expect to find on average of 47 RSYL.